Skip to content

connection: clean up failed heartbeat sends#876

Merged
dkropachev merged 1 commit into
scylladb:masterfrom
dkropachev:dk/heartbeat-send-failure-cleanup
May 10, 2026
Merged

connection: clean up failed heartbeat sends#876
dkropachev merged 1 commit into
scylladb:masterfrom
dkropachev:dk/heartbeat-send-failure-cleanup

Conversation

@dkropachev
Copy link
Copy Markdown
Collaborator

@dkropachev dkropachev commented May 6, 2026

Fixes #875.

Heartbeat sends can fail after a request id has already been reserved. This change keeps the request-id pool and in-flight accounting consistent across that failure path, and avoids double-releasing the slot on the control-connection branch.

Changes:

  • preserve heartbeat cleanup when send_msg() fails
  • only release the in-flight slot directly for control connections
  • add a regression test for a failed heartbeat send

@dkropachev dkropachev force-pushed the dk/heartbeat-send-failure-cleanup branch from 892831d to 358e016 Compare May 7, 2026 06:50
@dkropachev dkropachev self-assigned this May 7, 2026
@dkropachev dkropachev marked this pull request as ready for review May 7, 2026 10:15
@sylwiaszunejko sylwiaszunejko requested a review from Copilot May 7, 2026 11:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a heartbeat failure edge case where send_msg() can fail after a request id has been reserved, leaking the request id / callback registration and (for control connections) potentially leaving in_flight incorrectly incremented. It also adds a regression test to ensure request-id and in-flight bookkeeping is restored after a failed heartbeat send.

Changes:

  • Update HeartbeatFuture to unwind _requests registration and return the reserved request id when send_msg() fails.
  • Ensure in_flight is released directly only for control connections (since the control-connection owner’s return_connection() doesn’t decrement in_flight).
  • Add a unit test covering failed heartbeat send cleanup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cassandra/connection.py Adds failure-path cleanup in HeartbeatFuture for send_msg() exceptions (request id + _requests unwind; in_flight handling for control connections).
tests/unit/test_connection.py Adds regression test ensuring request id pool, _requests, and in_flight are consistent after a heartbeat send failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cassandra/connection.py
Comment thread cassandra/connection.py
Keep heartbeat request-id and in-flight bookkeeping consistent when send_msg() fails.\n\nHandle the control-connection in_flight release separately from HostConnection cleanup.
@dkropachev dkropachev force-pushed the dk/heartbeat-send-failure-cleanup branch from 358e016 to 6ad10b4 Compare May 7, 2026 13:55
@dkropachev dkropachev requested a review from Lorak-mmk May 7, 2026 14:24
@Lorak-mmk
Copy link
Copy Markdown

is this the right way to fix it?
On normal (non-exception path) the in_flight and other stuff is released somewhere by the connection (process_msg? Or some other method, I don't know). I think so because I see no other place in HeartbeatFuture where it is handled.
So perhaps it should be the same for the exception path? In other words, maybe send_msg should handle it?

@dkropachev
Copy link
Copy Markdown
Collaborator Author

is this the right way to fix it? On normal (non-exception path) the in_flight and other stuff is released somewhere by the connection (process_msg? Or some other method, I don't know). I think so because I see no other place in HeartbeatFuture where it is handled. So perhaps it should be the same for the exception path? In other words, maybe send_msg should handle it?

Can't agree more, there has to be an infrastructure that would handle request scheduling, executing, failing in respect of borrowing/returning request_id, tracking in_flight, etc.
It should be something like set of Future classes and proper API on connection to handle all the cases.
Unfortunately it is not like that and now in_flight, request_ids are messed around all over the code.
IMHO we should fix obvoius problems and then refactor whole area to make it sane.

@dkropachev dkropachev merged commit 51dd366 into scylladb:master May 10, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

connection: clean up failed heartbeat sends

3 participants